Subject:
Re: ODM Web Services
From:
David Valentine <valentin@sdsc.edu>
Date:
Tue, 27 Feb 2007 18:41:37 -0800
To:
Jeff Horsburgh <jeffh@cc.usu.edu>
CC:
"'Sharon A. Bernard'" <Sharon.Bernard@engr.utexas.edu>, blair@sdsc.edu, 'Bora Boran' <bb63@drexel.edu>, 'Catharine van Ingen' <vaningen@windows.microsoft.com>, 'Cedric David' <cedric.david@mail.utexas.edu>, 'David Maidment' <maidment@mail.utexas.edu>, dtarb@cc.usu.edu, eto@mail.utexas.edu, zaslavsk@sdsc.edu, jduncan@cuahsi.org, jon.goodall@duke.edu, 'Kim Schreuders' <kimas@cc.usu.edu>, Michael.Piasecki@drexel.edu, rhooper@cuahsi.org, twhit@mail.utexas.edu, tsignorellov@mail.utexas.edu, 'Zhumei Qian' <zqian@esri.com>

Jeff,
A new version of the ODM with a flag, UseNetworkAndVocabulary, is on water.
http://water.sdsc.edu/genericodws.zip
I have not run in through the release procedures, so hopefully it works. This should let you work with the ODM tools.

Jeff Said:
------

It seems that we have a couple of options: 1) we make the web services accept additional parameters in the GetVariableInfo and GetValues calls such that an ODM data series can be uniquely identified, or 2) we make more robust data discovery methods that allow us to return a VariableID associated with the set of attributes that make a Variable unique (i.e., VariableCode, VariableUnits, SampleMedium, ValueType, IsRegular, TimeSupport, DataType, GeneralCategory) and then use the VariableID in the GetVariableInfo and GetValues call.

-----

I have a response to these questions in two parts, first is a present status, and how to handle the request to retrieve by variableID, and other options, and the second is some thoughts on how the coupling of Network/SiteCode and Vocabulary/VariableCode should ideally be handled.

-----
PRESENT STATUS:
We already have this problem where one variableCode can return multiple rows, even for the same Network/Vocabulary. The EPA is only useful with sample medium. NWIS has been split into multiple services, but multiple variables would be another solution.

We should spec out an official way to do this with the variable parameter: eg
VOCAB:TERM\SampleMedium=YYYY\MethodName=ZZZZ\UnitsAbbreviation=UUU

As for VariableID, We could also add a keyword approach, where a certain vocabulary term mean "return this variableID" For example, we could say, a variable submitted to a web service with an vocabulary of "VARID" would return the variableInfo with that internal variableID.
VARID:NNNNNNNN

This variableID will be valid only to a this specific OD, or web service. This is what we now do for geometries in the location parameter. A network of "GEOM" says that this is a geometry, and not a siteID.

In order to do the above, the web services will need to coded to look for attributes that would be added to the variable request, and attempt to select the correct variable.

----
ISSUES ON TIGHTLY COUPLING Network/SiteCode AND Vocabulary/VariableCode

I agree that we need tracking of Networks and Vocabulary, but I disagree on how is proposed to be implemented. I don't feel that concatenating qualifier and term into a single column is the best approach. Yes we need to present this to the user, but concatenating two items into a single field and stroring them in a single database column is not ideal. My problem with this approach is we lose some query functionality. The the combined data does not exist as sepately as columns in the database. And as noted above, in order to differentiate, and communicate to the user, we may not only need to include the Vocabulary and VariableCode, but also other attributes as well. I think that we should add a column called "VariableCodeExternal." "VariableCodeExternal" would store the reference that would be sent to the user as the suggested way to reference this variable. Internally, we would need to use the database, because the user

I agree on the tightly coupling of Network/SiteCode and Vocabulary/VariableCode for the MyDB format. We want fewer columns, and attributes tightly bonded, since it is basically a flat file. We should output the uniquely identified VaribiableCode that are defined in VariableCodeExternal

For the OD database, I would recommend a more flexible representation. This be accomplished the this by adding a columns to the sites and variables table.

Sites add:
sourceID,NetworkCode,SiteCodeExternal
Vocabulary add:
sourceID,VocabularyCode,VariableCodeExternal

It has the same effect, since a Vocabulary and VariableCode are in the same row. But we have added provenance to the code with the sourceID, and put some responsibility on the programmer is responsible for doing the splitting and combining. And we have not lost information by combining multiple fields into a single column, we only generate a unique reference, XxxxCodeExternal. The unique reference is really a convince, since the programmer should split the input into component parts, and query a table in the database for the appropriate value. The order of the attributes might not be the same, so, again the programmer of the web service should not do a straight match.




Jeff Horsburgh wrote:
>
> David V. and Ilya,
>
> I downloaded the ODM web services and got them up and running on a development machine. It took me a while to figure out that I had to add permissions for the ASP.Net account to my SQL Server database, so you may want to add that to the instructions. I was happy that it didnt take me too long to get things running! However, I have come up with a couple of issues because I can only currently get one of the methods to work on my testing database.
>
> When I did my ODM Tools demo on the HIS conference call, David Maidment requested that for VariableCodes and SiteCodes I put in the ODM database a prefix that indicates where those codes come from. An example of how I went about this is the following for water temperature from the USGS NWIS system:
>
> SiteCode: NWIS:10109000
>
> SiteName: Logan River Near Logan, UT
>
> VaraibleCode: NWIS:00010
>
> VariableName: Temperature, water
>
> The above is what is in my testing database. Now, to the first issue - In the ODM Web services you tack on an additional network code to the SiteCode and VariableCode and require it in the parameter for the method calls. For example, to use GetSiteInfo for the above I would have to pass it: ODM:NWIS:10109000 as the site code where the ODM part comes from the setting in the web.config file. This seems to work for the GetSiteInfo method, but I cant get any of the other methods to work unless I remove the NWIS part of the string. For example, if I change my database so that the VariableCode for temperature is 00010 instead of NWIS:00010, the GetVariableInfo method will work.
>
> Up to this point it has probably been OK to assume that the data within an ODM instance comes from a single network (i.e., NWIS, STORET, etc.). I assume that is how you have the other web service catalogs set up (i.e., a separate ODM database for each), and I also assume that is why you have the network information in the web.config file of the web services and not based on information in the database. However, these are just assumptions since I dont really know what is going on under the hood. For the test beds and for observatories, though, it is very likely that they will have multiple networks contributing data to the same instance of ODM. Hence the network information needs to be with the SiteCodes and VariableCodes in the database and not on top of the web services. Based on David Maidments request that we identify where the VariableCodes come from, if we have USGS data and data collected by USU in a single ODM database, and both are collecting water temperature data, we might have something like:
>
> VariableCode: NWIS:00010  for water temperature data collected by USGS
>
> VariableCode: USU:10  for water temperature data collected by USU
>
> The above also brings me to the second issue. The VariableCodes in the ODM are not necessarily unique. For example I can get NWIS:00060 from the USGS NWIS daily values, I can get NWIS:00060 from the NWIS realtime data, and I can get NWIS:00060 from the instantaneous irregular data. If the test bed people choose to put all of this in the same ODM database, simply passing a single VariableCode (i.e, NWIS:00060) to the web service would lead to ambiguous results  which one do you return? In each of these cases, the Variables represented by a VariableCode of NWIS:00060 are not uniquely distinguished by their VariableCode  but they are uniquely qualified by a combination of attributes in the Variables table, such as their TimeSupport, their DataType, their ValueType, etc. ODM does not use the USGS convention of collapsing all variable attributes onto one VariableCode. We do have an essentially equivalent concept in the VariableID field, but the VaraibleIDs are somewhat arbitrary since they are just unique integers and could be different from ODM to ODM.
>
> Many of the test beds may choose not to put USGS data in their ODM database because you have the NWIS web services, but an equally likely scenario is the following. A test bed PI at USU is collecting continuous water temperature data. He puts the raw (Level 0) data into ODM with a VariableCode of USU:10 which is a hypothetical variable code that he assigns to the raw water temperature data. He then uses ODM Tools to create a Level 1 quality controlled data series from the Level 0 data. This new data series is added to the same ODM instance, and yes, he assigns it the same VariableCode (USU:10). The only thing that has changed is the QualityControlLevel of the data. He then creates a daily average temperature data series from the Level 1 data. Again, he assigns it a VariableCode of USU:10 because the variable didnt change, but the DataType, QualityControlLevel, the Method, and the TimeSupport changed. You get the idea
>
> It seems that we have a couple of options: 1) we make the web services accept additional parameters in the GetVariableInfo and GetValues calls such that an ODM data series can be uniquely identified, or 2) we make more robust data discovery methods that allow us to return a VariableID associated with the set of attributes that make a Variable unique (i.e., VariableCode, VariableUnits, SampleMedium, ValueType, IsRegular, TimeSupport, DataType, GeneralCategory) and then use the VariableID in the GetVariableInfo and GetValues call.
>
> Sorry for the long email. What do you guys think?
>
> Jeff Horsburgh
>
> Environmental Management Research Group
>
> Utah Water Research Laboratory
>
> 8200 Old Main Hill
>
> Logan, UT 84322-8200
>
> Phone: 435-797-2946
>
> Fax: 435-797-3663
>
> jeff.horsburgh@usu.edu <mailto:jeff.horsburgh@usu.edu>
>


-- 
David Valentine             GIS Programmer                Room 469                          San Diego Supercomputer Center    Univ of Calif, San Diego MC 0505 La Jolla, CA 92093-0505

phone: 858-822-0923
email: valentin@sdsc.edu





